Scaling Laws for Neural Language Models